AITopics | large-scale nearest neighbor classification

Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing SystemsDec-26-2025, 02:03:33 GMT

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings.

convergence, large-scale nearest neighbor classification, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing SystemsJan-27-2025, 16:38:14 GMT

The paper presents a simple and clear algorithm of a divide-and-conquer scheme of distributed classification using the nearest neighbour framework. The general methods presented are not entirely new, but are accompanied by a crisp statistical analysis which proves tight convergence rates to the Bayes optimal classifier. The novelty in this algorithm is the complete distributed nature of it - the fact that very little information must be transferred between different computing units as opposed to previous work algorithms which compelled more data transference between them. The claims are clear and understandable, and the theoretical part is very satisfactory, as well as a nice complementary empirical section showing improved speeds (in the denoised case) as well as improved regret. On the other hand, I add a few points to explain the overall score I have given this submission: 1) The algorithm is rather similar to the denoising algorithm referred to in the paper.

algorithm, artificial intelligence, large-scale nearest neighbor classification, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.40)

Add feedback

Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing SystemsOct-11-2024, 03:22:16 GMT

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter.

convergence, large-scale nearest neighbor classification, prediction, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rates of Convergence for Large-scale Nearest Neighbor Classification

Qiao, Xingye, Duan, Jiexin, Cheng, Guang

Neural Information Processing SystemsMar-19-2020, 01:02:05 GMT

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter.

convergence, large-scale nearest neighbor classification, prediction, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

large-scale nearest neighbor classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Rates of Convergence for Large-scale Nearest Neighbor Classification

Reviews: Rates of Convergence for Large-scale Nearest Neighbor Classification

Rates of Convergence for Large-scale Nearest Neighbor Classification

Rates of Convergence for Large-scale Nearest Neighbor Classification